Search CORE

Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites

Author: Gershenzon Naum I.
Ioshikhes Ilya P.
Stormo Gary D.
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden–Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices

Springer - Publisher Connector

Digital Commons@Becker

CORE

The features of Drosophila core promoters revealed by statistical analysis

Author: Gershenzon Naum I
Ioshikhes Ilya P
Trifonov Edward N
Publication venue: BioMed Central
Publication date: 01/06/2006
Field of study

BACKGROUND: Experimental investigation of transcription is still a very labor- and time-consuming process. Only a few transcription initiation scenarios have been studied in detail. The mechanism of interaction between basal machinery and promoter, in particular core promoter elements, is not known for the majority of identified promoters. In this study, we reveal various transcription initiation mechanisms by statistical analysis of 3393 nonredundant Drosophila promoters. RESULTS: Using Drosophila-specific position-weight matrices, we identified promoters containing TATA box, Initiator, Downstream Promoter Element (DPE), and Motif Ten Element (MTE), as well as core elements discovered in Human (TFIIB Recognition Element (BRE) and Downstream Core Element (DCE)). Promoters utilizing known synergetic combinations of two core elements (TATA_Inr, Inr_MTE, Inr_DPE, and DPE_MTE) were identified. We also establish the existence of promoters with potentially novel synergetic combinations: TATA_DPE and TATA_MTE. Our analysis revealed several motifs with the features of promoter elements, including possible novel core promoter element(s). Comparison of Human and Drosophila showed consistent percentages of promoters with TATA, Inr, DPE, and synergetic combinations thereof, as well as most of the same functional and mutual positions of the core elements. No statistical evidence of MTE utilization in Human was found. Distinct nucleosome positioning in particular promoter classes was revealed. CONCLUSION: We present lists of promoters that potentially utilize the aforementioned elements/combinations. The number of these promoters is two orders of magnitude larger than the number of promoters in which transcription initiation was experimentally studied. The sequences are ready to be experimentally tested or used for further statistical analysis. The developed approach may be utilized for other species

CORE

Determining Physical Mechanisms of Gene Expression Regulation from Single Cell Gene Expression Data.

Author: Adryan Boris
Ezer Daphne
Göttgens Berthold
Ioshikhes Ilya
Moignard Victoria
Publication venue: PLoS Comput Biol
Publication date: 01/08/2016
Field of study

Many genes are expressed in bursts, which can contribute to cell-to-cell heterogeneity. It is now possible to measure this heterogeneity with high throughput single cell gene expression assays (single cell qPCR and RNA-seq). These experimental approaches generate gene expression distributions which can be used to estimate the kinetic parameters of gene expression bursting, namely the rate that genes turn on, the rate that genes turn off, and the rate of transcription. We construct a complete pipeline for the analysis of single cell qPCR data that uses the mathematics behind bursty expression to develop more accurate and robust algorithms for analyzing the origin of heterogeneity in experimental samples, specifically an algorithm for clustering cells by their bursting behavior (Simulated Annealing for Bursty Expression Clustering, SABEC) and a statistical tool for comparing the kinetic parameters of bursty expression across populations of cells (Estimation of Parameter changes in Kinetics, EPiK). We applied these methods to hematopoiesis, including a new single cell dataset in which transcription factors (TFs) involved in the earliest branchpoint of blood differentiation were individually up- and down-regulated. We could identify two unique sub-populations within a seemingly homogenous group of hematopoietic stem cells. In addition, we could predict regulatory mechanisms controlling the expression levels of eighteen key hematopoietic transcription factors throughout differentiation. Detailed information about gene regulatory mechanisms can therefore be obtained simply from high throughput single cell gene expression data, which should be widely applicable given the rapid expansion of single cell genomics.This work was supported by: Royal Society Research Fellowship, Marshall Scholarship, Medical Research Council, the Leukemia and Lymphoma Society and core support grants from the Wellcome Trust to the Cambridge Institute for Medical Research and the Wellcome Trust and MRC Cambridge Stem Cell Institute.This is the final version of the article. It first appeared from the Public Library of Science via http://dx.doi.org/10.1371/journal.pcbi.100507

Warwick Research Archives Portal Repository

Apollo (Cambridge)

White Rose Research Online

FigShare

Snf2h Primes UL Neuron Production

Author: Alvarez-Saavedra Matías
Chaudary Nidhi
De Repentigny Yves
Hashem Lukas E.
Hirayama Teruyoshi
Ioshikhes Ilya
Kothary Rashmi
Picketts David J.
Sarwar Shihab
Yagi Takeshi
Yan Keqin
Yang Doo
Publication venue: Frontiers Media S.A.
Publication date: 10/02/2021
Field of study

Alterations in the homeostasis of either cortical progenitor pool, namely the apically located radial glial (RG) cells or the basal intermediate progenitors (IPCs) can severely impair cortical neuron production. Such changes are reflected by microcephaly and are often associated with cognitive defects. Genes encoding epigenetic regulators are a frequent cause of intellectual disability and many have been shown to regulate progenitor cell growth, including our inactivation of the Smarca1 gene encoding Snf2l, which is one of two ISWI mammalian orthologs. Loss of the Snf2l protein resulted in dysregulation of Foxg1 and IPC proliferation leading to macrocephaly. Here we show that inactivation of the closely related Smarca5 gene encoding the Snf2h chromatin remodeler is necessary for embryonic IPC expansion and subsequent specification of callosal projection neurons. Telencephalon-specific Smarca5 cKO embryos have impaired cell cycle kinetics and increased cell death, resulting in fewer Tbr2+ and FoxG1+ IPCs by mid-neurogenesis. These deficits give rise to adult mice with a dramatic reduction in Satb2C upper layer neurons, and partial agenesis of the corpus callosum. Mice survive into adulthood but molecularly display reduced expression of the clustered protocadherin genes that may further contribute to altered dendritic arborization and a hyperactive behavioral phenotype. Our studies provide novel insight into the developmental function of Snf2h-dependent chromatin remodeling processes during brain development

Tokushima University Institutional Repository

Restriction landmark genomic scanning (RLGS) spot identification by second generation virtual RLGS in multiple genomes with multiple enzyme combinations.

Author: Ansari Tahmina
Camoriano Marta
Chen Shih-Shih
Costello Joseph
Dai Zunyan
Dawson David W
Elliott Rosemary
Held William
Hong Jason S
Ioshikhes Ilya
Kazhiyur-Mannar Ramakrishnan
Liang Ping
Liu Chunhui
Oakes Christopher C
Plass Christoph
Rush Laura J
Smiraglia Dominic J
Smith Laura T
Song Fei
Su Jian
Szafranek Angela
Teitell Michael A
Trasler Jacquetta M
Wang Shu-Huei
Wenger Rephael
Wu Yue-Zhong
Yu Li
Publication venue: eScholarship, University of California
Publication date: 01/01/2007
Field of study

BackgroundRestriction landmark genomic scanning (RLGS) is one of the most successfully applied methods for the identification of aberrant CpG island hypermethylation in cancer, as well as the identification of tissue specific methylation of CpG islands. However, a limitation to the utility of this method has been the ability to assign specific genomic sequences to RLGS spots, a process commonly referred to as "RLGS spot cloning."ResultsWe report the development of a virtual RLGS method (vRLGS) that allows for RLGS spot identification in any sequenced genome and with any enzyme combination. We report significant improvements in predicting DNA fragment migration patterns by incorporating sequence information into the migration models, and demonstrate a median Euclidian distance between actual and predicted spot migration of 0.18 centimeters for the most complex human RLGS pattern. We report the confirmed identification of 795 human and 530 mouse RLGS spots for the most commonly used enzyme combinations. We also developed a method to filter the virtual spots to reduce the number of extra spots seen on a virtual profile for both the mouse and human genomes. We demonstrate use of this filter to simplify spot cloning and to assist in the identification of spots exhibiting tissue-specific methylation.ConclusionThe new vRLGS system reported here is highly robust for the identification of novel RLGS spots. The migration models developed are not specific to the genome being studied or the enzyme combination being used, making this tool broadly applicable. The identification of hundreds of mouse and human RLGS spot loci confirms the strong bias of RLGS studies to focus on CpG islands and provides a valuable resource to rapidly study their methylation

Springer - Publisher Connector

eScholarship - University of California

Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-depleted Murine Embryonic Stem Cells

Embryonic stem cells (ESC) have the potential to self-renew indefinitely and to differentiate into any of the three germ layers. The molecular mechanisms for self-renewal, maintenance of pluripotency and lineage specification are poorly understood, but recent results point to a key role for epigenetic mechanisms. In this study, we focus on quantifying the impact of histone 3 acetylation (H3K9,14ac) on gene expression in murine embryonic stem cells. We analyze genome-wide histone acetylation patterns and gene expression profiles measured over the first five days of cell differentiation triggered by silencing Nanog, a key transcription factor in ESC regulation. We explore the temporal and spatial dynamics of histone acetylation data and its correlation with gene expression using supervised and unsupervised statistical models. On a genome-wide scale, changes in acetylation are significantly correlated to changes in mRNA expression and, surprisingly, this coherence increases over time. We quantify the predictive power of histone acetylation for gene expression changes in a balanced cross-validation procedure. In an in-depth study we focus on genes central to the regulatory network of Mouse ESC, including those identified in a recent genome-wide RNAi screen and in the PluriNet, a computationally derived stem cell signature. We find that compared to the rest of the genome, ESC-specific genes show significantly more acetylation signal and a much stronger decrease in acetylation over time, which is often not reflected in an concordant expression change. These results shed light on the complexity of the relationship between histone acetylation and gene expression and are a step forward to dissect the multilayer regulatory mechanisms that determine stem cell fate.Comment: accepted at PLoS Computational Biolog

arXiv.org e-Print Archive

Harvard University - DASH

The University of Manchester - Institutional Repository

Comparison of Insertional RNA Editing in Myxomycetes

Author: AC Rhee
AL Chateigner-Boutin
B Blum
C Beargie
Cai Chen
D Miller
David Frankhouser
E Kotera
EM Byrne
EM Byrne
EM Byrne
H Takano
HC Smith
Ilya Ioshikhes
JM Gott
JM Gott
JM Gott
JM Gott
LM Visomirski-Robic
LM Visomirski-Robic
LM Visomirski-Robic
LP Keegan
MA Larkin
PG Hendrickson
PG Hendrickson
R Benne
R Bundschuh
R Bundschuh
R Bundschuh
R Mahendran
Ralf Bundschuh
SF Altschul
SJ Traphagen
T Liu
TL Horton
U Krishnan
V Knoop
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

RNA editing describes the process in which individual or short stretches of nucleotides in a messenger or structural RNA are inserted, deleted, or substituted. A high level of RNA editing has been observed in the mitochondrial genome of Physarum polycephalum. The most frequent editing type in Physarum is the insertion of individual Cs. RNA editing is extremely accurate in Physarum; however, little is known about its mechanism. Here, we demonstrate how analyzing two organisms from the Myxomycetes, namely Physarum polycephalum and Didymium iridis, allows us to test hypotheses about the editing mechanism that can not be tested from a single organism alone. First, we show that using the recently determined full transcriptome information of Physarum dramatically improves the accuracy of computational editing site prediction in Didymium. We use this approach to predict genes in the mitochondrial genome of Didymium and identify six new edited genes as well as one new gene that appears unedited. Next we investigate sequence conservation in the vicinity of editing sites between the two organisms in order to identify sites that harbor the information for the location of editing sites based on increased conservation. Our results imply that the information contained within only nine or ten nucleotides on either side of the editing site (a distance previously suggested through experiments) is not enough to locate the editing sites. Finally, we show that the codon position bias in C insertional RNA editing of these two organisms is correlated with the selection pressure on the respective genes thereby directly testing an evolutionary theory on the origin of this codon bias. Beyond revealing interesting properties of insertional RNA editing in Myxomycetes, our work suggests possible approaches to be used when finding sequence motifs for any biological process fails

Public Library of Science (PLOS)

Conformation Regulation of the X Chromosome Inactivation Center: A Model

Author: A Gimelbrant
A Wutz
Antonella Prisco
Antonio Scialdone
C Chureau
C Lanctôt
CL Tsai
D Chandler
D Tian
E Heard
E Splinter
EM Pugacheva
GD Penny
I Jonkers
I Okamoto
Ilaria Cataudella
Ilya Ioshikhes
IM van den Berg
J Chaumeil
J Chow
J Starmer
JT Lee
JT Lee
JT Lee
K Monkhorst
M Doi
M Nicodemi
M Nicodemi
M Nicodemi
M Renda
Mariano Barbieri
Mario Nicodemi
ME Donohoe
ME Donohoe
P Avner
P Clerc
P Fraser
P Navarro
P Navarro
P Navarro
S Augui
S Vigneau
T Misteli
T Sado
WW Quitschke
Y Marahrens
Y Ogawa
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

X-Chromosome Inactivation (XCI) is the process whereby one, randomly chosen X becomes transcriptionally silenced in female cells. XCI is governed by the Xic, a locus on the X encompassing an array of genes which interact with each other and with key molecular factors. The mechanism, though, establishing the fate of the X's, and the corresponding alternative modifications of the Xic architecture, is still mysterious. In this study, by use of computer simulations, we explore the scenario where chromatin conformations emerge from its interaction with diffusing molecular factors. Our aim is to understand the physical mechanisms whereby stable, non-random conformations are established on the Xic's, how complex architectural changes are reliably regulated, and how they lead to opposite structures on the two alleles. In particular, comparison against current experimental data indicates that a few key cis-regulatory regions orchestrate the organization of the Xic, and that two major molecular regulators are involved

Archivio della ricerca - Università degli studi di Napoli Federico II

Copenhagen University Research Information System

An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding

Author: A Arvey
A Marson
A Meissner
AC Mullen
AK Tewari
Akshay Kakumanu
B Langmead
C Taslim
Carolyn A. Morrison
D Strumpf
David K. Gifford
E Redhead
EO Mazzoni
EO Mazzoni
EO Mazzoni
Esteban O. Mazzoni
H Ji
H Niwa
H Xu
HS Rhee
Hynek Wichterle
Ilya Ioshikhes
J-CD Heng
JA Granek
JA Stamatoyannopoulos
JP Ferguson
K Liang
KS Zaret
M Berger
M Ku
MAT Figueiredo
Matthew D. Edwards
MD Robinson
MH Kagey
MP Creyghton
P Huggins
PB Rahl
R Jothi
RI Sherwood
Richard I. Sherwood
S John
S Mahony
S Mahony
SG Landt
Shaun Mahony
TL Bailey
TS Mikkelsen
X Chen
X Zeng
Y Guo
Y Guo
Y Zhang
Z Shao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2013
Field of study

Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization framework that shares information across multiple experiments for binding event discovery. We demonstrate that our framework enables the simultaneous modeling of sparse condition-specific binding changes, sequence dependence, and replicate-specific noise sources. MultiGPS encourages consistency in reported binding event locations across multiple-condition ChIP-seq datasets and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS's multi-experiment modeling approach thus provides a reliable platform for detecting differential binding enrichment across experimental conditions. We demonstrate the advantages of MultiGPS with an analysis of Cdx2 binding in three distinct developmental contexts. By accurately characterizing condition-specific Cdx2 binding, MultiGPS enables novel insight into the mechanistic basis of Cdx2 site selectivity. Specifically, the condition-specific Cdx2 sites characterized by MultiGPS are highly associated with pre-existing genomic context, suggesting that such sites are pre-determined by cell-specific regulatory architecture. However, MultiGPS-defined condition-independent sites are not predicted by pre-existing regulatory signals, suggesting that Cdx2 can bind to a subset of locations regardless of genomic environment. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant 0645960)National Institutes of Health (U.S.) (grant P01 NS055923)Pennsylvania State University. Center for Eukaryotic Gene Regulatio

CiteSeerX

Public Library of Science (PLOS)

DSpace@MIT

Harvard University - DASH

Columbia University Academic Commons